The Dresden Surgical Anatomy Dataset for Abdominal Organ Segmentation in Surgical Data Science

Laparoscopy is an imaging technique that enables minimally-invasive procedures in various medical disciplines including abdominal surgery, gynaecology and urology. To date, publicly available laparoscopic image datasets are mostly limited to general classifications of data, semantic segmentations of surgical instruments and low-volume weak annotations of specific abdominal organs. The Dresden Surgical Anatomy Dataset provides semantic segmentations of eight abdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands), the abdominal wall and two vessel structures (inferior mesenteric artery, intestinal veins) in laparoscopic view. In total, this dataset comprises 13195 laparoscopic images. For each anatomical structure, we provide over a thousand images with pixel-wise segmentations. Annotations comprise semantic segmentations of single organs and one multi-organ-segmentation dataset including segments for all eleven anatomical structures. Moreover, we provide weak annotations of organ presence for every single image. This dataset markedly expands the horizon for surgical data science applications of computer vision in laparoscopic surgery and could thereby contribute to a reduction of risks and faster translation of Artificial Intelligence into surgical practice.

in the context of surgical data science 11 . In a clinical setting, such algorithms could facilitate context-dependent recognition and thereby protection of vulnerable anatomical structures, ultimately aiming at increased surgical safety and prevention of complications.
One major bottleneck in the development and clinical application of such AI-based assistance functions is the availability of annotated laparoscopic image data. To meet this challenge, we provide semantic segmentations that provide information about the position of a specific structure by annotations of each pixel of an image. Based on video data from 32 robot-assisted rectal resections or extirpations, this dataset offers a total amount of 13195 extensively annotated laparoscopic images displaying different intraabdominal organs (colon, liver, pancreas, small intestine, spleen, stomach, ureter, vesicular glands) and anatomical structures (abdominal wall, inferior mesenteric artery, intestinal veins). For a realistic representation of common laparoscopic obstacles, it features various levels of organ visibility including small or partly covered organ parts, motion artefacts, inhomogeneous lighting and smoke or blood in the field of view. Additionally, the dataset contains weak labels of organ visibility for each individual image.
Adding anatomical knowledge to laparoscopic data, this dataset bridges a major gap in the field of surgical data science and is intended to serve as a basis for a variety of machine learning tasks in the context of image recognition-based surgical assistance functions. Potential applications include the development of smart assistance systems through automated segmentation tasks, the establishment of unsupervised learning methods, or registration of preoperative imaging data (e.g. CT, MRI) with laparoscopic images for surgical navigation.

Methods
This dataset comprises annotations of eleven major abdominal anatomical structures: abdominal wall, colon, intestinal vessels (inferior mesenteric artery and inferior mesenteric vein with their subsidiary vessels), liver, pancreas, small intestine, spleen, stomach, ureter and vesicular glands.
Video recording. Between February 2019 and February 2021, video data from a total of 32 robot-assisted anterior rectal resections or rectal extirpations performed at the University Hospital Carl Gustav Carus Dresden was gathered and contributed to this dataset. The majority of patients (26/32) were male, the overall average age was 63 years and the mean body mass index (BMI) was 26.75 kg/m 2 ( www.nature.com/scientificdata www.nature.com/scientificdata/ in MPEG-4 format and lasts between about two and ten hours. The local Institutional Review Board (ethics committee at the Technical University Dresden) reviewed and approved this study (approval number: BO-EK-137042018). The trial, for which this dataset was acquired, was registered on clinicaltrials.gov (trial registration ID: NCT05268432). Written informed consent to laparoscopic image data acquisition, data annotation, data analysis, and anonymized data publication was obtained from all participants. Before publication, all data was anonymized according to the general data protection regulation of the European Union.
For anatomical structures without a temporal annotation (abdominal wall, colon, intestinal vessels, small intestine and vesicular glands), sequences displaying the specific organ were selected and merged manually using LossLessCut version 3.20.1 (developed by Mikael Finstad). Random frames were extracted from the merged video file using a Python script (see section "Code availability"). The extraction rate (extracted frames per second) was adjusted depending on the duration of the merged video to extract up to 100 images per organ per surgery. Images were stored in PNG format at a resolution of 1920 × 1080 pixels.
For liver, pancreas, spleen, stomach and ureter, temporal annotations served as a basis for the frame-extraction process using the abovementioned Python script. Based on a TXT file with temporal annotations of organ presence, equidistant frames were extracted from respective sequences for each organ as outlined above. www.nature.com/scientificdata www.nature.com/scientificdata/ The resulting frames were audited and images that were not usable (e.g. the organ is not visible because it is concealed completely by an instrument, the complete field of view is filled with smoke, severely limited visibility due to a blurred camera) were excluded manually.
No automated filtering processes were applied to specifically select or avoid images (e.g. based on mutual information). To maintain the variability inherent to intraoperative imaging, no image preprocessing steps such as adaptation of image intensity or contrast, or window size) were performed. Images were directly extracted from the videos recorded during surgery, converted into PNG (lossless). These images were then directly annotated. www.nature.com/scientificdata www.nature.com/scientificdata/ The resulting dataset includes over 1000 images from at least 20 surgeries for each anatomical structure (Fig. 1).

Semantic segmentation.
For pixel-wise segmentation, we used 3D Slicer 4.11.20200930 (https://www. slicer.org) including the SlicerRT extension, an open-source medical image analysis software 12 . The anatomical structures were manually semantically segmented with the Segment Editor function using a stylus guided tablet computer running Microsoft Windows. The settings made during segmentation were "scissors", operation "fill inside", shape "free form", slice cut "symmetric". As a guideline we generated a segmentation protocol that describes inclusion criteria for each considered anatomical structure in detail (Supplementary File 2). Each individual image was semantically annotated according to this guideline by three medical students with basic experience in minimally-invasive surgery. Thus, exactly one specific anatomical structure was finally segmented in each image (e.g. the colon was pixel-wise annotated in each of the 1374 colon images). In addition, one multi-organ-segmentation dataset was created out of the 1430 stomach frames. The stomach dataset was chosen for this purpose because these images very often show various organs, such as the colon, small intestine or spleen as well as the abdominal wall. Subsequently, the three individual annotations were automatically merged (see section "Code availability"). Individual annotations alongside merged segments were reviewed and adjusted by a physician with three years of experience in minimally-invasive surgery. Figure 1 gives an overview over the image generation and verification process. Example annotations are provided in Fig. 2. Weak labeling. Weak labels provide information about the visibility of different anatomical structures in the entire image. Weak labels were annotated by one medical student with basic experience in minimally-invasive surgery and reviewed by a second one in each frame (Fig. 1).
The complete dataset is accessible at figshare 13 .

technical Validation
To merge the annotations of the three different annotations for each image in the dataset, the STAPLE algorithm 14 , which is commonly used for merging different segmentations in biomedical problems, was applied. Each annotator received the same weight. The merged annotations were then, together with the original segmentations of the annotators, uploaded to a segmentation and annotation platform called CVAT (https://github. com/openvinotoolkit/cvat) 15 hosted at the National Center for Tumor Diseases (NCT/UCC) Dresden. The physician in charge of reviewing the data could then log-in, select the most appropriate annotations for each image and, if necessary, adjust them.
To evaluate the extent of agreement between the segmentations of the individual annotators and the merged annotation with the final annotation of each image, we computed two standard metrics for segmentation comparison 16 : • F1 score, which showcases the overlap of different annotation with a value of 0 to 1 (0: no overlap, 1: complete overlap) • Hausdorff distance, a distance metric, which calculates the maximum distance between a reference annotation and another segmentation. Here we have normalized the Hausdorff distance via the image diagonal, resulting in values between 0 and 1, which 0 indicates that there is no separation between the two segmentations and 1 meaning there is a maximum distance between the two.
The results of this comparison can be found in Table 3, sorted according to the different tissue types. The table shows that for most organs there is no large discrepancy between the merged annotations and the final product, with most F1 scores being over 0.9 indicating a large overlap and the low value for the Hausdorff distance indicating that no tendencies for over or under-segmentation were present. Only the F1 score for the ureter class seems to indicate that the expert annotator had to regularly intervene, though the difference still seems to be minimal as indicated by the low Hausdorff distance.
Most annotators also seemed to regularly agree with the final annotation, though not always with the same degree as the merged annotation, justifying the fusion via STAPLE. Similar to the merged annotations, there were larger discrepancies in regard to the ureter class. Generally though, at least two annotators seemed to largely agree with the expert annotations.

Usage Notes
The provided dataset is publicly available for non-commercial usage under the Creative Commons Attribution CC-BY. If readers wish to use or reference this dataset, they should cite this paper.
The dataset can be used for various purposes in the field of machine learning. On the one hand, it can be used as a source of further image material in combination with other, already existing datasets. On the other hand, it can be used to create organ detection algorithms working either with weak labels or with semantic segmentation masks, for example as a basis for further development of assistance applications 17 . Proposed training-validation-test splits as well as results of detailed segmentation studies are reported in a separate publication 18 .

Code availability
The scripts for frame extraction, annotation merging, and statistical analysis, as well as the results of the statistical analysis are made public on https://gitlab.com/nct_tso_public/dsad and via https://zenodo.org/record/6958337#. YvIsP3ZBxaQ. All code is written in python3 and freely accessible.